Search CORE

127 research outputs found

A Framework for High-Accuracy Privacy-Preserving Mining

Author: Agrawal Shipra
Haritsa Jayant R.
Publication venue
Publication date: 01/01/2004
Field of study

To preserve client privacy in the data mining process, a variety of techniques based on random perturbation of data records have been proposed recently. In this paper, we present a generalized matrix-theoretic model of random perturbation, which facilitates a systematic approach to the design of perturbation mechanisms for privacy-preserving mining. Specifically, we demonstrate that (a) the prior techniques differ only in their settings for the model parameters, and (b) through appropriate choice of parameter settings, we can derive new perturbation techniques that provide highly accurate mining results even under strict privacy guarantees. We also propose a novel perturbation mechanism wherein the model parameters are themselves characterized as random variables, and demonstrate that this feature provides significant improvements in privacy at a very marginal cost in accuracy. While our model is valid for random-perturbation-based privacy-preserving mining in general, we specifically evaluate its utility here with regard to frequent-itemset mining on a variety of real datasets. The experimental results indicate that our mechanisms incur substantially lower identity and support errors as compared to the prior techniques

arXiv.org e-Print Archive

CiteSeerX

Open Access Repository of IISc Research Publications

Providing Diversity in K-Nearest Neighbor Query Results

Author: Haritsa Jayant R.
Jain Anoop
Sarda Parag
Publication venue
Publication date: 15/10/2003
Field of study

Given a point query Q in multi-dimensional space, K-Nearest Neighbor (KNN) queries return the K closest answers according to given distance metric in the database with respect to Q. In this scenario, it is possible that a majority of the answers may be very similar to some other, especially when the data has clusters. For a variety of applications, such homogeneous result sets may not add value to the user. In this paper, we consider the problem of providing diversity in the results of KNN queries, that is, to produce the closest result set such that each answer is sufficiently different from the rest. We first propose a user-tunable definition of diversity, and then present an algorithm, called MOTLEY, for producing a diverse result set as per this definition. Through a detailed experimental evaluation on real and synthetic data, we show that MOTLEY can produce diverse result sets by reading only a small fraction of the tuples in the database. Further, it imposes no additional overhead on the evaluation of traditional KNN queries, thereby providing a seamless interface between diversity and distance.Comment: 20 pages, 11 figure

arXiv.org e-Print Archive

Open Access Repository of IISc Research Publications

Transaction Scheduling in Firm Real-Time Database Systems

Author: Haritsa Jayant R
Publication venue: University of Wisconsin-Madison Department of Computer Sciences
Publication date: 01/01/1991
Field of study

Minds@University of Wisconsin

The Web is the Database

Author: Jayant R. Haritsa
Publication venue
Publication date: 01/01/2000
Field of study

Search engines are currently the standard medium for locating and accessing information on the Web. However, they may not scale to match the anticipated explosion of Web content since they support only extremely coarse-grained queries and are based on centralized architectures. In this pape

CiteSeerX

Open Access Repository of IISc Research Publications

The Picasso Database Query Optimizer Visualizer

Author: Jayant R. Haritsa
Publication venue
Publication date: 01/09/2010
Field of study

Modern database systems employ a query optimizer module to automatically identify the most efficient strategies for executing the declarative SQL queries submitted by users. The efficiency of these strategies, called “plans”, is measured in terms of “costs ” that ar

CiteSeerX

Approximate Analysis of Real-Time Database Systems

Author: Jayant R. Haritsa
Publication venue: IEEE
Publication date: 01/01/1994
Field of study

During the past few years, several studies have been made on the performance of real-time database systems with respect to the number of transactions that miss their deadlines. These studies have used either simulation models or database testbeds as their performance evaluation tools. We present here a preliminary analytical performance study of real-time transaction processing. Using a series of approximations, we derive simple closed-form solutions to reduced realtime database models. Although quantitatively approximate, the solutions accurately capture system sensitivity to workload parameters and indicate conditions under which performance bounds are achieved

CiteSeerX

Open Access Repository of IISc Research Publications